PCITMiner- Prefix-based Closed Induced Tree Miner for finding closed induced frequent subtrees
نویسندگان
چکیده
Frequent subtree mining has attracted a great deal of interest among the researchers due to its application in a wide variety of domains. Some of the domains include bio informatics, XML processing, computational linguistics, and web usage mining. Despite the advances in frequent subtree mining, mining for the entire frequent subtrees is infeasible due to the combinatorial explosion of the frequent subtrees with the size of the datasets. In order to provide a reduced and concise representation without information loss, we propose a novel algorithm, PCITMiner (Prefix-based Closed Induced Tree Miner). PCITMiner adopts the prefix-based pattern growth strategy to provide the closed induced frequent subtrees efficiently. The empirical analysis reveals that our algorithm significantly outperforms the current state of the art algorithm, PrefixTreeISpan(Zou, Lu, Zhang, Hu and Zhou 2006b).
منابع مشابه
BOSTER: An Efficient Algorithm for Mining Frequent Unordered Induced Subtrees
Extracting frequent subtrees from the tree structured data has important applications in Web mining. In this paper, we introduce a novel canonical form for rooted labelled unordered trees called the balanced-optimal-search canonical form (BOCF) that can handle the isomorphism problem efficiently. Using BOCF, we define a tree structure guided scheme based enumeration approach that systematically...
متن کاملCMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees
Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. One important problem in mining databases of trees is to find frequently occurring subtrees. However, because of the combinatorial explosion, the number of frequent subtrees usually grows exponentially with the size of the subtrees. In this paper, we p...
متن کاملAnalysis of glycan recognition sites of viruses
Currently, viruses such as new swine influenza infect various animals, causing much casualty around the world. Glycans exist on the membrane surface of the virus, allowing it to escape from the immune system of the host. Thus, glycans are key for infecting the host. For example, it is known that glycans containing sialic acid are recognized and bound by the influenza virus in order to infect th...
متن کاملClosed Pattern Mining from n-ary Relations
In this paper, we address the problem of closed pattern mining from n-ary relations. We propose CnS-Miner algorithm which enumerates all the closed patterns of the given n-dimensional dataset in depth first manner satisfying the user specified minimum size constraints. From the given input, the CnS-Miner algorithm generates an n-ary tree and visits the tree in depth first manner. We have propos...
متن کاملIMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding
Tree mining has recently attracted a lot of interest in areas such as Bioinformatics, XML mining, Web mining, etc. We are mainly concerned with mining frequent induced and embedded subtrees. While more interesting patterns can be obtained when mining embedded subtrees, unfortunately mining such embedding relationships can be very costly. In this paper, we propose an efficient approach to tackle...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007